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Abstract. We show how to construct a topological Markov map 
of the interval whose invariant probability measure is the station- 
ary law of a given stochastic chain of infinite order. In particular we 
caracterize the maps corresponding to stochastic chains with mem- 
ory of variable length. The problem treated here is the converse 
of the classical construction of the Gibbs formalism for Markov 
expanding maps of the interval. 

1. Introduction. 

The founding papers by Bowen (2008), Ruelle (1978) and Sinai 
(1972) explained how to use the Gibbs formalism for Markov expand- 
ing maps of the interval. In this formalism to each such map of the 
interval is associated a Gibbs measure which corresponds through the 
dynamical coding to an absolutely continuous invariant measure. Re- 
calling that Gibbs measures with Holder continuous interactions are 
stochastic chains of infinite order (cf. Fernandez and Mailard 2004 and 
references therein), this means that expanding maps of the interval 
are naturally associated to stochastic chains. In particular, piecewise 
affine topological Markov maps correspond to Markov chains on a finite 
alphabet. 

In this paper we address the converse problem, namely, given a sto- 
chastic chain of infinite order, taking values on a finite alphabet, can 
we construct a topological Markov map of the interval whose invariant 
measure is the invariant probability measure of the chain? 

A particular case of this question has to do with the class of stochastic 
chains with memory of variable length, introduced by Rissanen (1983). 
Recently Cenac et al. [I] have shown how to represent two interesting 
examples of stochastic chains with memory of variable (unbounded) 
length by maps of the interval. Inspired by this paper, we discuss at 
a more general level some of the relations between chains of infinite 
order, chains with memory of variable length models and expanding 
Markov maps of the interval. 
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This paper is organized as follows. In Section [2] we briefly present the 
notions of expanding maps of the interval and stochastic chains of infi- 
nite order and for the convenience of the reader we recall some classical 
results. In Section [3] we recall the classical construction of a stochastic 
chain of infinite order given an expanding map of the interval. For 
more details about this construction we refer the reader to the articles 
of [16], [T7] and [9] and references therein. In Section H] we explain how 
to construct an expanding map of the interval given a stochastic chain 
of infinite order. Finally in Section [5] we study the particular case of 
stochastic chains with memory of variable (unbounded) length. 

2. Notation, chains and maps. 

In order to make this paper self contained as much as possible, we 
gather in this section some basic definitions and results about stochastic 
chains and maps of the interval. 

Let A denote a finite alphabet A = {1, . . . , JC}. Given two integers 
m < n we denote by u>™ the sequence (w m , . . . , w n ) of symbols in A, 
and A 7 ^ denotes the set of such sequences. Any sequence with 
m > n represents the empty string. The same notation is extended to 
the cases m = ± oo. 

Given two finite sequences w and v we will denote by vw the sequence 
obtained by concatenating the two strings. For example, zZ^a denotes 
the sequence having the symbol a at the zero position and the symbols 
Zi at the positions i < — 1. 

For a finite string G A™ , we denote by C(a^) the cylinder given 

by 

C( a m) = { x -^> e At.™ ■ £™ = a^,} • 

2.1. Stochastic chains of infinite order. A family p of numbers 
p(a\xZlo) G [0, 1], with a E A and il^ G AzIq, is called a family of 
transition probabilities if it satisfies the two conditions 

• For each fixed sequence xZlo 

J2p(a\xZ^) = 1 • 

a 

• For each symbol a G A, the map 

X -L — >P{a\ x -lo) 
is measurable with the product sigma- algebra on Az]^. 

Definition 2.1. A f amity p of transition probabilities satisfies the con- 
dition of non-nullness if 

inf{p(a|x:^) : aeA.x^e Az 1 ^} > . 
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Definition 2.2. The continuity rate of a family p of transition proba- 
bilities is the sequence (Pk)k>i defined by 

Pk = supj |p(a | aT^)-p(a| : a e A.xZ^yZ 1 ^ G AZ^ with xZ{ = yl^lL = V-L }• 

Definition 2.3. The family p of transition probabilities with continuity 
rate ((3k) is said to be continuous if 

lim f3 k = 0. 

k— >+oo 

Definition 2.4. We will say that a probability measure P on A 1 is 
translation invariant (or stationary) if for any m > and for any 
a™ G A™, we have 

/or any n£Z, 

The notion of translation invariance says that the probability mea- 
sure P is invariant with respect to the shift 5? : A 1 " — > A z , defined as 
follows. For every sequence z = zZ^,, we have 



Definition 2.5. We will say that a probability measure P on A z is 
invariant with respect to p, if for any continuous function f : A°_ ^ — > R 
we nave 

(2-1) / /(^oc)^(^oo) = / £p(« I (O/fc^WO 

From stationarity and invariance of P with respect to p it follows 
immediately that for any pair m < n of integers and for any aZL\ G A, 
we have 



n 

m * 



For later references, it is convenient to collect in the following theo- 
rem some well know results about families of transition probabilities. 

Theorem 2.6. If the family of transition probabilities satisfies the non- 
nullness condition \2. 1\ and the sequence of continuity rates is summable, 
then there exists a unique ergodic stationary probability measure P on 
A z , invariant with respect to p. This invariant probability measure has 
no atom, and for any finite sequence aZ^, P(C(a™)) > 0. 

This type of result has been proved by many authors starting with 
Onicescu and Mihoc (1935), Doeblin and Fortet (1937) and Harris 
(1955) and Comets et aJ. (2002) . 
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Let us now consider the probability space having A as sample space, 
equipped with its product cr-algebra, and having P, whose existence is 
granted by Theorem 12. 6[ as probability measure. We can define a 
stochastic chain (X n ) ngZ on this probability space, by taking, for each 
neZ 

X n :A z — > A 

as the projection on the n th coordinate. In other words, for any m < n 
and any choice of the sequence a^, we have 

p(c(«C)) = = <} ■ 

The stochastic chain (X n ) n& z is said to be associated to the family of 
transition probabilities p. 

2.2. Piecewise expanding maps of the interval. From now on let 
Q = [0, 1]. We first recall the definition of a piecewise expanding map of 
the interval O. Let = rjo < r\\ < . . . < t]k, = 1 be a finite sequence and 
for each interval Ij —\rjj-x, %■[, with (1 < j ' < JC), let Tj be a monotone 
map from Ij to Q which extends to a C 2 map on Ij = The 
map T is defined as follows. For each uj G fi\{^o, Vh ■ ■ ■ > Vic} 

T{u) = Tjiuj) , if uj G Ij . 

We denote by V, the collection of open intervals Ij, with j — 1, . . . , JC, 
and observe that it defines a partition of the VL\jVq, where JV§ = 
{rjo, T)x, . . . , rj/c}. From now on let us call A = {1, . . . , JC} the set of 
indexes of the partition V . 

Definition 2.7. The map T of the interval has the (uniform) expanding 
property if there is an integer m > and a constant c > 1 such that at 
any point where T m is differentiable we have 

\T m '\ > c . 

Definition 2.8. The piecewise expanding map T of the interval is said 
to be topological Markov if for any % = 1, . . . ,]C, the closure ofT(Ii) is 
a union of closures of intervals Ij, j G {1, . . . , /C}. The map T is called 
full topologicall Markov if for any Tj(/j) = Q for any i = 1, . . . , JC. 

Note that the topological Markov property notion is not to be mis- 
taken with the Markov property of stochastic processes. 

Recall that the map T is not defined on the finite set jY§. Call jV 
the set of pre-images of jVq, namely 

jV = ^u|J {uj G O j T k {uj) G ^} 

k>l 
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Given an full topological Markov expanding map of the interval T, 
we define a coding of VL\,yV with alphabet A = {1, . . . , /C}. This coding 
is a map W from £1\jV to A n 

given by 

w n (u) = j, if rvjei,-. 

Given a full topological Markov expanding map of the interval T, we 
have just associated a code to a point in We can also go in the 

opposite direction and this is the content of the next proposition. To 
simplify the presentation we will restrict ourselves to the case of full 
Markov maps. The extension to the case of general Markov maps is 
straightforward. 

Proposition 2.9. Assume T is a full topological Markov expanding 
map of the interval and A is the set of indexes of the partition V. 
Then given a code x^ 00 G Aq°° , there exists at most one point in the 
interval Q which is coded by this sequence. 

Both directions are well known, see for instance [16] and |17j . 



3. Constructing a chain from a map 

Let fi be a T- invariant measure defined on Q = [0,1]. We now have 
the three ingredients of a probability space, namely the sample space 
Q = [0, 1], with its Borel a-algebra and the T-invariant probability /i. 
Furthermore, the coding associated to the map T defines a sequence of 
random variables (W n ) n< z^ with values in the alphabet A. 

Let us denote by q{x^} (with m < n belong to Z) the cylinder 
probabilities on A 1, defined by 

q(x^) = G Q , Wq{u) = x n , Wi(u) = x n -i, W n - m (uj) = x m ] . 

The time was reversed in the definition of q to follow the usual conven- 
tion for stochastic processes. 

Kolmogorov's Existence Theorem implies that there exists a unique 
stationary probability measure P on A z such that for any integers m < 
n, and any sequence G A 7 ^ we have 

where X n : A z — > A is the projection on the n th coordinate. The 
(X n ) n( zz is in general a chain of infinite order. The next theorem will 
give an explicit expression for its family of transition probabilities 

p(b\aZ 1 oo )=F(X = b\XZ 1 oo = aZl o ) . 
We denote by A the Lebesgue measure of the interval Q = [0, 1]. 
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Theorem 3.1. Let Q = [0,1] and let T be a full topological Markov 
expanding map of the interval . Assume that the Lebesgue measure A is 
invariant and ergodic with respect to T. Then the family of transition 
probabilities of the associated chain of infinite order is given by 

p{ b | a -L) = lim 



n^oo \T'(uJ n )\ 

where u n is any point in Q, such that 

W£(u n ) = (6,a_i, • . . ,a- n ) . 
For the proof of Theorem 3.1 we ferer the reader to [IB], [T7] and 

ma- 

In the general case, where the invariant abosutely continuous invari- 
ant measure // is not the Lebesgue measure A, we have the folowing 
result. 

Corollary 3.2. LetT be a topological Markov piecewise expanding map 
on Q. Assume that the probability measure \x which is absolutely con- 
tinuous with respect to the Lebesgue measure A is invariant and ergodic 
with respect to T. Then the family of transition probabilities of the 
associated chain of infinite order is given by 



p{ b | a -L) = lim 



g{uj r , 



n^oo g(T(u n ))\T'(0J n )\ 

where g = dfi/dX, and u n is any point such that 
Wo(u n ) = (b, a_i, . . . , a- n ) . 
Proof. Let G be the distribution of \x defined in the usual way by 

G(t) = fi([0,t}) . 

Obviously G is a non decreasing function which is also continuous since 
/i is absolutely continuous with respect to the Lebesgue measure A. By 
a theorem of Buzzi (1997) (see also Liverani (1995)), fi is equivalent to 
the A, and g = dfi/dX = G' is a continuous non-vanishing function. In 
other words G is a C 1 diffeomorphism. 

Consider G^ 1 as a random variable defined on the probability space 
(O, Voo, A). This fact together with the invertibility of G implies that 
the Lebesgue measure A is invariant and ergodic with respect to the 
map T defined by 

T n = G o T o G- 1 . 



Theorem 13.11 applies to To, and the corollary follows by the chain rule. 

□ 



chains of infinite order, chains with memory of variable length, and maps of the 1 

4. Constructing a map from a chain. 

Let (X n ) be a stationary ergodic stochastic chain taking values in 
the finite alphabet A — {1, ... ,K.}, and defined on a probability space 
J^,P). Let us assume that the law P of the chain has no atom. 
Our goal is to define a map T : Q — > Q, where Q = [0,1], such that the 
construction of Section [3] recovers the chain (X n ). The map T will be 
defined by a conjugation to the shift 5? through a map h : A°_ ^ — > Q 
defined below. 

We define a distance on as follows. 

Definition 4.1. First of all, for two sequences a;^ and y .^, denote 
by d~(x°_ CJO , y^oo) the nearest position to the origin where these two se- 
quences differ, namely 

8 (^-oo, V-oo) = min {n > : x_ n ^ y_ n } . 
For a fixed number < £ < 1, we define the distance d on A^^ by 

d(x . oo ,y ( L oc )=C< x -- y -^ ■ 

We denote by < the lexicographic order on A ^^. Namely x ^ < 
y ^, if for some m > 0, we have x°_ (m _ 1) = y° (m _ 1) and x_ m < y_ m . 
For a point x .^ G A° ^ we denote by J(x°_ OQ ) the set of points 

J(x . oo ) = {y° oo \y° oo <x°_ oo } . 

We define the map ft. from A^^ to f2 by 

Before stating the properties of the map h, we need to define a 
countable set =S of exceptional codes, given by 

K— 1 oo AC— 1 

■2 = U 1 =io0'+ 1 )} U U U {^-lA, , 

i=l k =Vx°_ k £A°_ k 3=1 

where Kilo and 11^ denote the sequences identically equal to K, and 
1, respectively. 

Proposition 4.2. Let p be a family of transition probabilities satisfy- 
ing the non-nullness condition \2.1\ The map h defined above has the 
following properties 

i) h is non decreasing on Q and strictly increasing outside £}; 

ii) h is continuous; 

Hi) h is invertible except on the countable set h(J2), and the set of 
preimages of any point in h(J2) has cardinality at most two; 

iv) the inverse function h~ l is continuous outside h(J2); 

v) the image o/P by h is the Lebegue measure on Q; 
vi) finally h is sujective. 
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Proof. We first prove that the map h is injective except on the count- 
able set i2. Let x ^ < y ^^- This means that x < y , or there exists 
an integer k > such that x°_ k = y°_ k , and x-(k+i) < V-(k+i)- Assume 
^ J2. This implies that for infinitely many indices n, we have 
x -n < /C — 1. Let m > k be such an index. For any z°_ ^ in the cylinder 
C(JCx°_ we have 

x° < 2° < w° 

— m — m &—m ' 

Therefore 

J(x°J n C(/Cx° (m _ 1} ) = , and C(Kx°_ (m _ 1} ) C J(/ m ) . 
From Theorem 12.61 we have P(C(/Cx° , > 0, hence 

/»(a£J = P(J(*°J) < P(J(Z/°J) = h(y°_ m ) . 

The case where ^ =S can be treated similarly. 
If x°_ ^ G £ and y°_ ^ G «0 but 

) 7^ (£_«X„, i-L(x + 1) 

and for any k > 0, 

then there exists x°_ ^ ^ and such that x° ^ < x° ^ < y ^. From 
above it follows that 

fc(a^) < < My J . 

Finally, if for some a 6 {1, ...,/C- 1} we have either 

x°_ ^ = /Cl^a and = H^(a + 1) , 

or 

^ = /Cl2 +2) ax°_, and y ^ = II^V + 1)A , 
for some k > 1, then h^x^^) = ^(y^). This concudes the proof of 
(i)- 

Now let us prove that the map h is continuous. Take x ^ G ^4°oo, 
and let (y-oo(n)) be a sequence in A°_ ^ converging to in the metric 
defined in 14.11 This implies that for any k there exists n(k) such that 
for any n > n(k), y°_ k (n) = x°_ k . This implies 

Ay°-oo(n))^J(x°.J C C(x°_ k ) , 

and therefore 

|% ooH)-^°oo)|<P(^° fc ))- 

By Theorem [22] the probability measure P has no atoms, hence F(C(x®_ k )) 
tends to 0, when k tends to oo, proving that h is continuous. This con- 
cludes the proof of (ii). 

Assertion (iii) and (iv) follow immediately from (i) and (ii). 
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Finally to prove (v), take z G Q\h(j2). The inverse value h 1 (z) is 
uniquely defined, and therefore 

\([0,z]) = z = h(h-\z)) = F(j(h-\z))) = F(h-\[0,z})) . 

Since the measure P and the Lebesgue measure have no atoms, the 
same result holds for the countable set of points in h(£}). This implies 
by standard measure theoretic arguments (see for example Breiman 
1992) that A is the image of P) by h. This concludes the proof of (v). 

Finally (vi) follows from the fact that the measure P has no atom by 
Theorem 12.61 the map h is continuous and hence surjective. □ 

We define the map T on Q\J2 by 

t = hoy oh- 1 . 

More explicitly, for z G Q\J2 we have 

(4.1) T{z) = F(j(yh~\z))) = ¥{yj(h- l (z))) . 

Theorem 4.3. Let p be a family of transition probabilities satisfying 
the non-nullness and the continuity conditions \2. II and \2. 3\ Then 

(1) The map T defined above can be continously extended to a mono- 
tone increasing map on each inteval Ij =]Vj-iiVj[> with j = 
1, . . . , K,, with end points = r] < rji < . . . < r/^ = 1 defined 
by 

Vk = h(lCzlk) = h(lZl(k + 1)) , fork = 1, . . . , K - 1 . 

(2) The extended map (also denoted by T) is a topological Markov 
map and the Lebesgue measure is invariant by T and ergodic. 
Moreover the regular versions of the conditional probabilities 
associated to the sequence of dynamical partitions are given by 
p. 

(3) The map T is differentiate outside h(£?) and for each u G 
fi\/i(=2) we have 



v ; K/>-V)o|/*-»=L)' 

In this formula we denote the successive elements of the se- 
quence h- 1 ^) G by /i~ 1 (o;) _ oo . 
(4) Foru G h{£l), wih 

uj = h{!CZt +2) az\)=h{rt +2 \a + l)z\)) } 

for some a G {1, . . . , K— 1} and some integer k > — 1, then the 
left and right derivatives of T at u exist and are given by 



p(z | z_\alC_ { ^ +2) ) p(z | z_l(a + l)l_^ +2) ) ' 

respectively. 
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(5) In particular, if p is such that for any a £ {1, . . . , JC — 1} and 
any integer k > and any z°_ k , we have 

(4.2) p{z | zZlaKZl +2) ) = p{z \ z Z \{a + l)ll£ +2) ) , 

then the map T is piecewise C l . 

(6) If the continuity rate fa, defined in \2.S\ decays exponentially 
fast, and conditions (14.21) are satisfied, then the map T is piece- 
wise C 1+a , where a > depends on the exponential rate of 
decay of fa. 

The proof of Theorem 14.31 will use several times the following lemma 

Lemma 4.4. For any pair of points u < v in Q\h(£!) and belonging 
to the same monotonicity interval Ij =}Vj-iiVj[ °f T , for any j = 
1, . . . , K. — 1, we have 

Proof. By definition 

d\(u) f dF(z°_J 



u p(h- l (uj) Q | hr x (u))J^ Jh-mu,v])P(zo\z-L) 

(4.3) = / 1 ^- 1 W)\^-W)(^oo) rfp( ^o j 

In the above formula, denotes the characteristic func- 

tion of the set J {h' 1 {v))\J {hr x {u)) . Since u and v by hypothesis belong 
to the same monotonicity interval, we have that 

h-\u) = h' l (v) . 

Let / : A°_ ^ -> R be the function 

\ _ lj(/»- 1 (f))\" , (^- 1 («))( 2;0 oo) _ ^{z Q =h-^(v) } l.r(J(fe- 1 (f))\-/(^- 1 («)))( 2;Z ^o) 



f( z '_ , 

p(zb|*C») P(*b|z~i>) 
Using the invariance of P (see (12.11) ) with the function /, we can rewrite 
the integral (14. 3 j) as 

'ft-J{h- 1 (v))\J(h- 1 (u)){Z-oo) jp/^O \ _ f -fl / -l up/ -l \ 

Pl^ol Z-txJ 7 

Now we observe that 

y(J(h-\u)) C J^(J(^ 1 (v)) 

and therefore 

ly ( ;(H WM H(„)))(^)(ff(ti) = P{^(J(/i- 1 (^))}-P{^(J(/i- 1 (w))} . 
Now it is enough to use equality (14. ip to conclude the proof. □ 
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We can now prove Theorem 14.31 

Proof. Assertion 1 of the theorem follows directly from Lemma 14.11 
For Assertion 2, we start by observing that for % — 1, . . . , K, we have 



and for % — 0, . . . , K, — 1 



lim T(oS) 



lim T(u) = . 



The topological Markov property follows from the piecewise mono- 
tonicity of T. 

The invariance and ergodicity of the Lebesgue measure A follows 
from the fact that T and the shift 5? are conjugated by h. 

To prove that p is the regular version of the conditional probability 
we start with equality 

A(4oJ =P{C(x°_ k )) 

where 

I x o_ h = {Gj\Ti(u;)eI x _ j ,j = 0...k} . 
Therefore, for any ar.^ £ 

& A(7T7) = MT(47)) = p(c^jj = p(xo 1 ' 

where the last equality follows from the continuity of the family of 
transition probabilities p. 

Assertions 3, 4 and 5 follow directly from Lemma fl~4"t and the finite- 
ness of the derivative follows from the non-nullness assumption. 

To prove Assertion 6, we first observe that the exponential decay of 
the continuity rate (3k implies that there exists two constants C > 
and < p < 1 such that for any k > 1 

(4.4) Pk<Cp k . 

Let 

1 

7 



SUp^o eA o p(x | xj^) ' 

— OO — CO 

and 

1 



inf s o^ 6A o^p(zo|z4c) ' 
From the non-nullness assumption it follows immediately that 7 > 1 , 
and T < 00. For uj and 00' in the same interval of monotonicity Ij, let 

m = 5(h- 1 {u),h- 1 {u')) , 

where 5 was defined in 14.11 Let 



12 PIERRE COLLET AND ANTONIO CALVES 

where [ ] denotes the integer part. 

We first consider the case m > M Then from (14.41) we have 

\p(h-\u) \h-\uj)zl) - p(h-\u') \h-\uTJj\ < C /fc-W-V)) 

< C p^ 1 p- lo s\ u - u '\/ l °et = c p^ 1 \oj — uj'\^ l °sp/ l °si _ 
This implies that 

1 1 



\T'(u) - T'U 



< Y 2 C p- 1 \u -u'\- logp/l °^ . 



We now consider the case m < M. If 



/ | 1 / 2 r m >min{A(/ 1 ),A(^)} 



we have 



1,, „ , logmin{A(Ja),A(Jx;)} 
m > — — — — log a; — a; | + 



2 log T log r 

The same estimate as before implies 

\t'(u)— t'{uj')\ < r 2 c p^ 1 p losmin { A ( /i )' A ( /K )}/ losr | w — w '|- io gp/( 2io s7) 

Finally if 

we have, assuming u' > u, that 

h-\ u ) = / i - 1 (c,):r 2 " A//2 /c:^l_ M/2 / i - 1 (c,)- m / i - 1 (u;)° m+1 

and 

h-\u') = ^ 1 (a;0:r 2 ~ M/2 i::: 1 -M/ 2 (^ 1 (^)-- + i)^ 1 (^)V 1 • 

From inequality ( 14.41) we get 

|p(^ 1 (c) | ^ 1 (a;):^ +1 /i- 1 (a;)_ ni x:i-:;_ M/2 /i- 1 (a;):r 2_M/2 ) 

-p(/ i - 1 (c) I ^ 1 (c):^ +1 / i - 1 (u;)_ m /C:-- 1 | < Cp m+A/ / 2 

and 

|p(/tV)q | ^ 1 (^):™ + i^ 1 (^)- m i:^ 1 _M/ 2 ^ 1 (^ , ):r 2 " M/2 ) 

-^(aOol^CwO^x^^O-mlir 1 ! < Cp m+M / 2 . 
Observing that 

and using Assumption 14.21 we obtain 

-p(/tV)o I ^ 1 (^ / ):^ +1 ^ 1 (^)-r,i:^ 1 _M /2 ^ 1 (^):r 2 ~ M/2 )| 

< 2Cp m+M/2 . 
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The conclusion follows as in the two other cases. 

□ 

5. The case of chains with memory of variable length. 

Stochastic chains with memory of variable length appeared in the 
pionering paper by Rissanen (1983) as a universal system for data 
compression. We briefly recall the definition of this class of stochastic 
chains . 

Given a finite alphabet A, we define the basic notion of context tree. 
Definition 5.1. A set of strings 

fc>l 

is a context tree if 

(1) {j w&T C{w)=A-_\ o] 

(2) for any pair w and w' of elements of t, if w ^ w' , then C(w) D 
C(w') =0. 

In the above definition w and w' denote two sequences, either finite 
or infinite, and C{w) is the set of all elements of Az\o having the string 
if as a suffix, i. e. having w as final sequence. In case w is finite, C(w) 
is a cylinder. In case w is infinite C(w) is the unitary set whose unique 
element is w. The name context tree comes from the fact that r can 
be described by the leaves of a rooted tree. The strings belonging to r 
are called contexts. 

Definition 5.2. A probabilistic context tree is a pair (r,p), where t is 
a context tree and 

P = {p( 4 ) \w)\w Er} 
is a family indexed by r of probability measures on the set A. 

Given a probabilistic context tree (r,p), we define a family of infi- 
nite order transition probabilities p on A as follows. For any sequence 
xZlo E Azlo, and for any symbol a E A 

(5.1) p(a \ xZlo) = p(a\w) 

where w is the unique element of r, such that xZ 1 ^ E C{w). 

Definition 5.3. A stochastic chain of infinite order is said to have a 
memory of variable length described by a probabilistic context tree (r,p) 
if its family of transition probabilities satisfies conditions ( 15. ip . 

Intuitively speaking in a chain with memory of variable length, at 
each time step, to predict the next symbol, it is enough to use the past 
steps corresponding to the context associated to this past. 

The question we address in this section is to characterize the maps 
associated to transition probabilities defined by a probabilistic context 
tree. This is the content of the following theorem. 
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Theorem 5.4. LetT be a topological Markov expanding map of the in- 
terval with alphabet of monotonicity intervals A, and with the Lebesgue 
measure invariant and ergodic. Assume there is a tree of contexts r on 
the alphabet A such that 

oo 

E E a(c(x:1)) = i , 

and for any xZ k G t, for any a E A and for any to and to 1 satisfying 

W{uS)q = ax-i . . . X-k , and W(oj')q = m_i . . . , 
we have 

T'(co) = T'(u/) . 

Then the family of transition probabilites associated to the map T by 
theorem IJ.il is a chain with variable length whose contexts are almost 
surely finite. 

Conversely, given a family of transition probabilities which is a chain 
of variable length with almost surely finite contexts (for an invariant 
measure), then the associated map by (14. ip (see also Theorem \4-3\ ) is 
piecewise affine with derivatives satisfying the above property. 

Proof. The result follows directly from Theorems 13.11 and 14.31 □ 
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